PDBProp - Working With a Single PDB Structure

This notebook gives a tutorial of the PDBProp object, specifically how chains are handled and how to map a sequence to it.

Input: PDB ID
Output: PDBProp object

Imports

In [ ]:
from ssbio.databases.pdb import PDBProp
from ssbio.databases.uniprot import UniProtProp
In [ ]:
import sys
import logging
In [ ]:
# Create logger
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)  # SET YOUR LOGGING LEVEL HERE #
In [ ]:
# Other logger stuff for Jupyter notebooks
handler = logging.StreamHandler(sys.stderr)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] %(levelname)s: %(message)s', datefmt="%Y-%m-%d %H:%M")
handler.setFormatter(formatter)
logger.handlers = [handler]

Basic methods

In [ ]:
my_structure = PDBProp(ident='5T4Q', description='E. coli ATP synthase')

Download the structure

Downloading will: - Download the file type of choice to the specific output directory - Parse the PDB header file to fill out the metadata fields

In [ ]:
import tempfile
my_structure.download_structure_file(outdir=tempfile.gettempdir(), file_type='mmtf')

View all attributes

In [ ]:
my_structure.get_dict()

Set chains that we are interested in (if any)

The mapped_chains attribute allows us to limit sequence analyses to specified chains (see the later section where we align a sequence to this structure). For this example, the ATP synthase is a complex of a number of protein chains, and if we are interested in a specific gene transcript, we can set those.

In [ ]:
# Chains A, B, and C make up ATP synthase subunit alpha - from the gene b3734 (UniProt ID P0ABB0)
my_structure.add_mapped_chain_ids(['A', 'B', 'C'])

Parse the structure to work with the Biopython Structure object

Parsing the structure will parse the sequences of each chain, and store those in the chains attribute. It will also return a Biopython Structure object which opens up all methods available for structures in Biopython.

In [ ]:
parsed_structure = my_structure.parse_structure()
print(type(parsed_structure.structure))
print(type(parsed_structure.first_model))

Clean the structure and save the structure

Cleaning a structure does the following: - Add missing chain identifiers to a PDB file - Select a single chain if noted - Remove alternate atom locations - Add atom occupancies - Add B (temperature) factors (default Biopython behavior)

In the example below, we will clean the structure so it only includes our mapped chains.

In [ ]:
cleaned_structure = my_structure.clean_structure(outdir='/tmp', keep_chains=my_structure.mapped_chains, force_rerun=True)
cleaned_structure

Viewing the structure

In [ ]:
# The original structure
my_structure.view_structure(recolor=False)
In [ ]:
# The cleaned structure
import nglview
nglview.show_structure_file(cleaned_structure)